7 research outputs found

    Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

    Get PDF
    Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature.publishedVersio

    Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

    Get PDF
    Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature.publishedVersio

    Big Data Workflows: DSL-based Specification and Software Containers for Scalable Execution

    No full text
    Big Data workflows are composed of multiple orchestration steps that perform different data analytics tasks. These tasks process heterogeneous data using various computing and storage resources. Due to the diversity of application domains, involved technologies, and complexity of data sets, the design and implementation of Big Data workflows require the collaboration of domain experts and technical experts. However, existing tools are too technical and cannot easily allow domain experts to participate in the process of defining and executing Big Data workflows. Moreover, the majority of existing tools are designed for specific applications such as bioinformatics, computational chemistry, and genomics. They are also based on specific technology stacks that do not provide flexible means of code reuse and maintenance. This thesis presents the design and implementation of a Big Data workflow solution based on the use of a domain-specific language (DSL) for hiding complex technical details, enabling domain experts to participate in the process definition of workflows. The workflow solution uses a combination of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The applicability of the solution is demonstrated by implementing a prototype based on a real-world data workflow. As per performed evaluations, the proposed workflow solution was evaluated to provide efficient workflow definition and scalable execution. Furthermore, the results of a set of experiments were presented, comparing the performance of the proposed approach with Argo Workflows, one of the most promising tools in the area of Big Data workflows.Big Data-arbetsflöden bestÄr av flera orkestreringssteg som utför olika dataanalysuppgifter. Dessa uppgifter bearbetar heterogena data med hjÀlp av olika databehandlings- och lagringsresurser. PÄ grund av stora variationen av tillÀmpningsomrÄden, den involverade tekniken, och komplexiteten hos datamÀngderna, krÀver utformning och implementering av Big Data-arbetsflöden samarbete mellan domÀnexperter och tekniska experter. Befintliga verktyg Àr dock för tekniska och vilket försvÄrar för domÀnexperter att delta i processen att definiera och genomföra Big Data-arbetsflöden. Dessutom Àr majoriteten av befintliga verktyg utformade för specifika tillÀmpningar, som bioinformatik, berÀkningskemi och genomik. Verktygen Àr ocksÄ baserade pÄ specifika teknikstackar som inte erbjuder flexibla metoder för att kunna underhÄlla och ÄteranvÀnda kod. Denna avhandling Àmnar att presentera design och implementering av en Big Data-arbetsflödeslösning som utnyttjar ett domÀnspecifikt sprÄk (DSL) för att dölja komplexa tekniska detaljer, vilket gör det möjligt för domÀnexperter att delta i processdefinitionen av arbetsflöden. Arbetsflödeslösningen anvÀnder en kombination av mjukvaruutrustningsteknik och meddelande-orienterad mellanvara (MOM) för att möjliggöra en mer skalbar körning av arbetsflöden. TillÀmpningslösningen demonstreras genom att implementera en prototyp baserad pÄ ett verkligt dataflöde. Efter en granskning av de genomförda testerna modifierades den föreslagna arbetsflödeslösningen för att uppnÄ en effektiv arbetsflödesdefinition och skalbar körning. Dessutom presenteras resultaten av en uppsÀttning experiment dÀr man jÀmför skalbarheten för det föreslagna tillvÀgagÄngssÀttet med Argo Workflows, ett av de mest lovande verktygen inom Big Data-arbetsflöde

    Big Data Workflows: DSL-based Specification and Software Containers for Scalable Execution

    No full text
    Big Data workflows are composed of multiple orchestration steps that perform different data analytics tasks. These tasks process heterogeneous data using various computing and storage resources. Due to the diversity of application domains, involved technologies, and complexity of data sets, the design and implementation of Big Data workflows require the collaboration of domain experts and technical experts. However, existing tools are too technical and cannot easily allow domain experts to participate in the process of defining and executing Big Data workflows. Moreover, the majority of existing tools are designed for specific applications such as bioinformatics, computational chemistry, and genomics. They are also based on specific technology stacks that do not provide flexible means of code reuse and maintenance. This thesis presents the design and implementation of a Big Data workflow solution based on the use of a domain-specific language (DSL) for hiding complex technical details, enabling domain experts to participate in the process definition of workflows. The workflow solution uses a combination of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The applicability of the solution is demonstrated by implementing a prototype based on a real-world data workflow. As per performed evaluations, the proposed workflow solution was evaluated to provide efficient workflow definition and scalable execution. Furthermore, the results of a set of experiments were presented, comparing the performance of the proposed approach with Argo Workflows, one of the most promising tools in the area of Big Data workflows.Big Data-arbetsflöden bestÄr av flera orkestreringssteg som utför olika dataanalysuppgifter. Dessa uppgifter bearbetar heterogena data med hjÀlp av olika databehandlings- och lagringsresurser. PÄ grund av stora variationen av tillÀmpningsomrÄden, den involverade tekniken, och komplexiteten hos datamÀngderna, krÀver utformning och implementering av Big Data-arbetsflöden samarbete mellan domÀnexperter och tekniska experter. Befintliga verktyg Àr dock för tekniska och vilket försvÄrar för domÀnexperter att delta i processen att definiera och genomföra Big Data-arbetsflöden. Dessutom Àr majoriteten av befintliga verktyg utformade för specifika tillÀmpningar, som bioinformatik, berÀkningskemi och genomik. Verktygen Àr ocksÄ baserade pÄ specifika teknikstackar som inte erbjuder flexibla metoder för att kunna underhÄlla och ÄteranvÀnda kod. Denna avhandling Àmnar att presentera design och implementering av en Big Data-arbetsflödeslösning som utnyttjar ett domÀnspecifikt sprÄk (DSL) för att dölja komplexa tekniska detaljer, vilket gör det möjligt för domÀnexperter att delta i processdefinitionen av arbetsflöden. Arbetsflödeslösningen anvÀnder en kombination av mjukvaruutrustningsteknik och meddelande-orienterad mellanvara (MOM) för att möjliggöra en mer skalbar körning av arbetsflöden. TillÀmpningslösningen demonstreras genom att implementera en prototyp baserad pÄ ett verkligt dataflöde. Efter en granskning av de genomförda testerna modifierades den föreslagna arbetsflödeslösningen för att uppnÄ en effektiv arbetsflödesdefinition och skalbar körning. Dessutom presenteras resultaten av en uppsÀttning experiment dÀr man jÀmför skalbarheten för det föreslagna tillvÀgagÄngssÀttet med Argo Workflows, ett av de mest lovande verktygen inom Big Data-arbetsflöde

    Big Data Workflows: DSL-based Specification and Software Containers for Scalable Execution

    No full text
    Big Data workflows are composed of multiple orchestration steps that perform different data analytics tasks. These tasks process heterogeneous data using various computing and storage resources. Due to the diversity of application domains, involved technologies, and complexity of data sets, the design and implementation of Big Data workflows require the collaboration of domain experts and technical experts. However, existing tools are too technical and cannot easily allow domain experts to participate in the process of defining and executing Big Data workflows. Moreover, the majority of existing tools are designed for specific applications such as bioinformatics, computational chemistry, and genomics. They are also based on specific technology stacks that do not provide flexible means of code reuse and maintenance. This thesis presents the design and implementation of a Big Data workflow solution based on the use of a domain-specific language (DSL) for hiding complex technical details, enabling domain experts to participate in the process definition of workflows. The workflow solution uses a combination of software container technologies and message-oriented middleware (MOM) to enable highly scalable workflow execution. The applicability of the solution is demonstrated by implementing a prototype based on a real-world data workflow. As per performed evaluations, the proposed workflow solution was evaluated to provide efficient workflow definition and scalable execution. Furthermore, the results of a set of experiments were presented, comparing the performance of the proposed approach with Argo Workflows, one of the most promising tools in the area of Big Data workflows.Big Data-arbetsflöden bestÄr av flera orkestreringssteg som utför olika dataanalysuppgifter. Dessa uppgifter bearbetar heterogena data med hjÀlp av olika databehandlings- och lagringsresurser. PÄ grund av stora variationen av tillÀmpningsomrÄden, den involverade tekniken, och komplexiteten hos datamÀngderna, krÀver utformning och implementering av Big Data-arbetsflöden samarbete mellan domÀnexperter och tekniska experter. Befintliga verktyg Àr dock för tekniska och vilket försvÄrar för domÀnexperter att delta i processen att definiera och genomföra Big Data-arbetsflöden. Dessutom Àr majoriteten av befintliga verktyg utformade för specifika tillÀmpningar, som bioinformatik, berÀkningskemi och genomik. Verktygen Àr ocksÄ baserade pÄ specifika teknikstackar som inte erbjuder flexibla metoder för att kunna underhÄlla och ÄteranvÀnda kod. Denna avhandling Àmnar att presentera design och implementering av en Big Data-arbetsflödeslösning som utnyttjar ett domÀnspecifikt sprÄk (DSL) för att dölja komplexa tekniska detaljer, vilket gör det möjligt för domÀnexperter att delta i processdefinitionen av arbetsflöden. Arbetsflödeslösningen anvÀnder en kombination av mjukvaruutrustningsteknik och meddelande-orienterad mellanvara (MOM) för att möjliggöra en mer skalbar körning av arbetsflöden. TillÀmpningslösningen demonstreras genom att implementera en prototyp baserad pÄ ett verkligt dataflöde. Efter en granskning av de genomförda testerna modifierades den föreslagna arbetsflödeslösningen för att uppnÄ en effektiv arbetsflödesdefinition och skalbar körning. Dessutom presenteras resultaten av en uppsÀttning experiment dÀr man jÀmför skalbarheten för det föreslagna tillvÀgagÄngssÀttet med Argo Workflows, ett av de mest lovande verktygen inom Big Data-arbetsflöde

    Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

    No full text
    Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature

    Conceptualization and scalable execution of big data workflows using domain-specific languages and software containers

    No full text
    Big Data processing, especially with the increasing proliferation of Internet of Things (IoT) technologies and convergence of IoT, edge and cloud computing technologies, involves handling massive and complex data sets on heterogeneous resources and incorporating different tools, frameworks, and processes to help organizations make sense of their data collected from various sources. This set of operations, referred to as Big Data workflows, requires taking advantage of Cloud infrastructures’ elasticity for scalability. In this article, we present the design and prototype implementation of a Big Data workflow approach based on the use of software container technologies, message-oriented middleware (MOM), and a domain-specific language (DSL) to enable highly scalable workflow execution and abstract workflow definition. We demonstrate our system in a use case and a set of experiments that show the practical applicability of the proposed approach for the specification and scalable execution of Big Data workflows. Furthermore, we compare our proposed approach’s scalability with that of Argo Workflows – one of the most prominent tools in the area of Big Data workflows – and provide a qualitative evaluation of the proposed DSL and overall approach with respect to the existing literature
    corecore